Using a Probabilistic Model of Context to Detect Word Obfuscation

نویسندگان

  • Sanaz Jabbari
  • Ben Allison
  • Louise Guthrie
چکیده

This paper proposes a distributional model of word use and word meaning which is derived purely from a body of text, and then applies this model to determine whether certain words are used in or out of context. We suggest that we can view the contexts of words as multinomially distributed random variables. We illustrate how using this basic idea, we can formulate the problem of detecting whether or not a word is used in context as a likelihood ratio test. We also define a measure of semantic relatedness between a word and its context using the same model. We assume that words that typically appear together are related, and thus have similar probability distributions and that words used in an unusual way will have probability distributions which are dissimilar from those of their surrounding context. The relatedness of a word to its context is based on Kullback-Leibler divergence between probability distributions assigned to the constituent words in the given sentence. We employed our methods on a defense-oriented application where certain words are substituted with other words in an intercepted communication.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Studying impressive parameters on the performance of Persian probabilistic context free grammar parser

In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...

متن کامل

یک مدل موضوعی احتمالاتی مبتنی بر روابط محلّی واژگان در پنجره‌های هم‌پوشان

A probabilistic topic model assumes that documents are generated through a process involving topics and then tries to reverse this process, given the documents and extract topics. A topic is usually assumed to be a distribution over words. LDA is one of the first and most popular topic models introduced so far. In the document generation process assumed by LDA, each document is a distribution o...

متن کامل

ASIC design protection against reverse engineering during the fabrication process using automatic netlist obfuscation design flow

Fab-less business model in semiconductor industry has led to serious concerns about trustworthy hardware. In untrusted foundries and manufacturing companies, submitted layout may be analyzed and reverse engineered to steal the information of a design or insert malicious Trojans. Understanding the netlist topology is the ultimate goal of the reverse engineering process. In this paper, we propose...

متن کامل

A Model of E-Loyalty and Word-Of-Mouth based on e-trust in E-banking services (Case Study: Mellat Bank)

Customers extend robust trust to a business when they believe the business puts their interests first. Good experience of banking services and recommendations of other customers can increase trust. Loyalty and Word of mouth (WOM) is accepted as key factors successes of marketing. This paper seeks to discover the affecting factors on positive word of mouth and loyalty based on trust enhancement ...

متن کامل

First Language Activation during Second Language Lexical Processing in a Sentential Context

 Lexicalization-patterns, the way words are mapped onto concepts, differ from one language      to another. This study investigated the influence of first language (L1) lexicalization patterns on the processing of second language (L2) words in sentential contexts by both less proficient and more proficient Persian learners of English. The focus was on cases where two different senses of a polys...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008